Feature Transfer Learning for Speech Emotion Recognition
نویسنده
چکیده
Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition conditions, SER systems are often faced with scenarios, where the intrinsic distribution mismatch between the training and the test data has an adverse impact on these systems. To address this issue, this thesis makes use of autoencoders as an expressive learner to introduce a set of novel feature transfer learning algorithms. They are based on the goal to achieve a matched feature space representation for the target and source sets while ensuring source domain knowledge transfer. Partly inspired by the recent successes of feature learning, this thesis first incorporates sparse autoencoders into semi-supervised feature transfer learning. Furthermore, in the unsupervised setting, i.e., without the availability of any labeled target data in the training phase, this thesis takes advantage of denoising autoencoders, shared-hiddenlayer autoencoders, adaptive denoising autoencoders, extreme learning machine autoencoders, and subspace learning with denoising autoencoders, for feature transfer learning. Experimental results are presented on a wide range of emotional speech databases , demonstrating the advantages of the proposed algorithms over other modern transfer learning methods. Besides normal phonated speech, these transfer learning methods are also evaluated on whispered speech emotion recognition, which shows that these methods can be applied to create a recognition model owing a completely trainable architecture that can adapt it to a range of speech modalities.
منابع مشابه
A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملLearning Corpus-Invariant Discriminant Feature Representations for Speech Emotion Recognition
As a hot topic of speech signal processing, speech emotion recognition methods have been developed rapidly in recent years. Some satisfactory results have been achieved. However, it should be noted that most of these methods are trained and evaluated on the same corpus. In reality, the training data and testing data are often collected from different corpora, and the feature distributions of di...
متن کاملRepresentation Learning for Speech Emotion Recognition
Speech emotion recognition is an important problem with applications as varied as human-computer interfaces and affective computing. Previous approaches to emotion recognition have mostly focused on extraction of carefully engineered features and have trained simple classifiers for the emotion task. There has been limited effort at representation learning for affect recognition, where features ...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کامل